-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry middleware in query-frontend #814
Conversation
Requests from the query-frontend to the querier might fail when a querier is starting or stopping. Simply retrying this request usually solves this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
My concern is if this change could lead to hammering down a slow or overloaded querier, making the issue worse. It might be worth considering using a backoff for the retries or other mitigation alternatives.
In Cortex the querier pulls work from the queue in query-frontend. The amount of queries that are processed concurrently by a querier is limited by MaxConcurrentQueries. BTW, this PR is pretty much the same code as Cortex uses in their queryrange processor which shards requests by day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit. nice addition.
modules/frontend/retry.go
Outdated
} | ||
|
||
span.LogFields(ot_log.String("msg", "error processing request"), ot_log.Int("try", triesLeft), ot_log.Error(err)) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue?
does this do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it continues the for loop 😛
I've removed it, I refactored my loop a couple of times for clarity.
Thanks for the explanation |
What this PR does:
Requests from the query-frontend to the querier might fail when a querier is starting or stopping. Simply retrying this request usually solves this.
This PR adds a retry middleware which will relaunch requests (up to 5 times by default) if the request failed (http 500 or error).
Which issue(s) this PR fixes:
Related to #761
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]